Introduction

This project is an analysis of fast food restaurants in the United States as of May 2019. This data, collected by Datafiniti and available on Kaggle (https://www.kaggle.com/datafiniti/fast-food-restaurants), has information on 10,000 fast food restaurants in the US, including name, address, city, and more.

Using this data I hope to better understand the fast food market in America. We see fast food all around us, but quite often never think about it overall. In order to do so, I will break my analysis down into 3 question:

  1. What are the most popular fast food restaurants in the US?
  2. In what US location is fast food most abundant?
  3. Is the number of fast food restuarants related to population?

Data Import and First Look

Library imports:

library(ggplot2)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
## 
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
library(usmap)
library(knitr)

Next I import the dataset and look into the structure by using head to examine the first few observations.

ff_data <- read_csv("Datafiniti_Fast_Food_Restaurants.csv")
## Parsed with column specification:
## cols(
##   id = col_character(),
##   dateAdded = col_datetime(format = ""),
##   dateUpdated = col_datetime(format = ""),
##   address = col_character(),
##   categories = col_character(),
##   city = col_character(),
##   country = col_character(),
##   keys = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   name = col_character(),
##   postalCode = col_character(),
##   province = col_character(),
##   sourceURLs = col_character(),
##   websites = col_character()
## )
kable(head(ff_data))
id dateAdded dateUpdated address categories city country keys latitude longitude name postalCode province sourceURLs websites
AVwcmSyZIN2L1WUfmxyw 2015-10-19 23:47:58 2018-06-26 03:00:14 800 N Canal Blvd American Restaurant and Fast Food Restaurant Thibodaux US us/la/thibodaux/800ncanalblvd/1780593795 29.81470 -90.81474 SONIC Drive In 70301 LA https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3/menu,https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3,http://tripadvisor.com/Restaurant_Review-g40459-d4654052-Reviews-Sonic_Drive_In-Thibodaux_Louisiana.html,https://www.yellowpages.com/thibodaux-la/mip/sonic-drive-in-468367546 https://locations.sonicdrivein.com/la/thibodaux/800-north-canal-boulevard.html,http://sonicdrivein.com,http://www.sonicdrivein.com
AVwcmSyZIN2L1WUfmxyw 2015-10-19 23:47:58 2018-06-26 03:00:14 800 N Canal Blvd Fast Food Restaurants Thibodaux US us/la/thibodaux/800ncanalblvd/1780593795 29.81470 -90.81474 SONIC Drive In 70301 LA https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3/menu,https://foursquare.com/v/sonic-drive-in/4b73615df964a520abab2de3,http://tripadvisor.com/Restaurant_Review-g40459-d4654052-Reviews-Sonic_Drive_In-Thibodaux_Louisiana.html,https://www.yellowpages.com/thibodaux-la/mip/sonic-drive-in-468367546 https://locations.sonicdrivein.com/la/thibodaux/800-north-canal-boulevard.html,http://sonicdrivein.com,http://www.sonicdrivein.com
AVwcopQoByjofQCxgfVa 2016-03-29 05:06:36 2018-06-26 02:59:52 206 Wears Valley Rd Fast Food Restaurant Pigeon Forge US us/tn/pigeonforge/206wearsvalleyrd/-864103396 35.80379 -83.58055 Taco Bell 37863 TN https://www.yellowpages.com/pigeon-forge-tn/mip/taco-bell-474241430,https://foursquare.com/v/taco-bell/4ded6885d22deb0316df557d/menu,https://foursquare.com/v/taco-bell/4ded6885d22deb0316df557d http://www.tacobell.com,https://locations.tacobell.com/tn/pigeon-forge/206-wears-valley-road.html?utm_source=yextandutm_campaign=yextpowerlistingsandutm_medium=referralandutm_term=026432andutm_content=website
AVweXN5RByjofQCxxilK 2017-01-03 07:46:11 2018-06-26 02:59:51 3652 Parkway Fast Food Pigeon Forge US us/tn/pigeonforge/3652parkway/93075755 35.78234 -83.55141 Arby’s 37863 TN http://www.yellowbook.com/profile/arbys_1633893026.html,https://foursquare.com/v/arbys/4bae29f8f964a520348c3be3,https://www.yellowpages.com/pigeon-forge-tn/mip/arbys-6911678,https://www.allmenus.com/tn/pigeon-forge/166343-arbys/menu/,http://tripadvisor.com/Restaurant_Review-g55270-d1123265-Reviews-Arby_s-Pigeon_Forge_Tennessee.html,http://www.citysearch.com/profile/9409306/pigeon_forge_tn/arby_s.html http://www.arbys.com,https://locations.arbys.com/us/tn/pigeon-forge/3652-parkway.html
AWQ6MUvo3-Khe5l_j3SG 2018-06-26 02:59:43 2018-06-26 02:59:43 2118 Mt Zion Parkway Fast Food Restaurant Morrow US us/ga/morrow/2118mtzionparkway/1305117222 33.56274 -84.32114 Steak ’n Shake 30260 GA https://foursquare.com/v/steak-n-shake/4bcf77a741b9ef3bb87df8e5 http://www.steaknshake.com/locations/23851-steak-n-shake-mt-zion-parkway-morrow
AVwc57jLkufWRAb50ROs 2015-10-23 23:59:49 2018-06-26 02:59:43 9768 Grand River Ave Fast Food Restaurant Detroit US us/mi/detroit/9768grandriverave/-791445730 42.36882 -83.13825 Wendy’s 48204 MI https://foursquare.com/v/wendys/4bfec191e584c928932f6d25,http://tripadvisor.com/Restaurant_Review-g42139-d4455438-Reviews-Wendy_s-Detroit_Michigan.html,http://www.yellowpages.com/detroit-mi/mip/wendys-5831913 http://www.wendys.com

After taking a first look into the data, we see that there are many features. For the purpose of this analysis, we will only need a subset of these features. Therefore to keep it more organized and central to our analysis, I will trim the dataset to a smaller number of features.

ff_data <- ff_data[,c("city", "name", "province")]
kable(head(ff_data))
city name province
Thibodaux SONIC Drive In LA
Thibodaux SONIC Drive In LA
Pigeon Forge Taco Bell TN
Pigeon Forge Arby’s TN
Morrow Steak ’n Shake GA
Detroit Wendy’s MI

Now we have simplified the data to only 3 main features: city, name, and province (state). Our data now has 10,000 rows and 3 columns.

To do a bit of data cleaning, we turn all the names into all lower case and remove any punctuation. This is to catch some of the duplicatins of names, for example Chick-fil-a vs. Chick-Fil-A. While this will catch most of the duplicates, there may still be repeats in cases where there are different versions of a name. For example, five guys and five guys burgers and fries will not be combined. However, this does not appear to have a disruptive affect on the data analysis.

ff_data$name <- tolower(ff_data$name)
ff_data$name <- removePunctuation(ff_data$name)

After taking a look into the data and organized what we will be using from it, we can now dive into our analysis.

Data Analysis

Question 2: In what US location is fast food most abundant?

To begin answering our next question, I want to look at the restaurants by state.

state_ff <- ff_data %>%
  group_by(province) %>%
  summarise(
    n()
  )
names(state_ff) <- c("state", "number")
kable(head(state_ff))
state number
AK 16
AL 6
AR 102
AZ 330
CA 1201
CO 148

As we see here, there is a vast range of number of fast food restaurants recorded for each state. Because we only have information on 10,000 restaurants total in the US (and know there are more), this information is not complete. However we will use it as a sample to understand the broader trends. One important area of missing data is within Alabama. Only 6 restaurants are reported in Alabama, which we know not to be true.

Using the above state data, we can now create a barplot to visualize which states have the most fast food restaurants.

ggplot(data=state_ff, aes(x=reorder(state, -number), y=number)) + geom_bar(stat="identity", fill="pink") + 
labs(title="Number of fast food restaurants per state", x="State", y="Number of restaurants") + 
theme(axis.text.x = element_text(angle=90))

Looking at the barplot above, we see that California far exceeds the number of fast food restaurants in other states, with nearly 1250. The next states in terms of most fast food restaurants are Texas, Florida, Ohio, Georgia, and Illinois.

Now let’s look at this information on a map to better visualize the geographic spread.

plot_usmap(data = state_ff, values = "number") + 
  scale_fill_continuous(low = "white", high = "blue", name = "Number of fast food restaurants", label = scales::comma
  ) + theme(legend.position = "right") + labs(title = "Number of fast food restaurants cross the US")

In this map, the darker the color the more fast food restaurants there are in that state. Instantly we see California stand out, as it is the only state in the darkest category. We then notice states like Texas, Florida, and Ohio, which were shown in the barplot as well to have high numbers of fast food restaurants.

When thinking about the fact that states like California, Texas, and Florida have the most fast food restaurants according to this data, there may be some possible confounding variables to the analysis. Question 3 will take a look at a possible reason for these states to top the list.

Conclusion

It is clear, both through going about our lives and through this data, that fast food is a very popular type of establishment in the United States today. We constantly pass different chains as we drive throughout cities. Through this analysis, we were able to take a deeper dive into characteristics of fast food restaurants in modern America.

Our first question was looking at what are the most popular fast food restaurants in the US. For this question, popular was synonymous with abundant. This simplification may not reflect the true popularity and sentiment of customers, though it is fair to assume that if something weren’t very popular there wouldn’t be so many. From our analysis we learned that McDonalds is by far the most popular fast food restaurant. Other top competitors included Taco Bell, Burger King, Subway, and Arbys. We also learned that when breaking down into state-popularity, only 6 different fast food chains prevailed as the most popular in a state. Once again, McDonalds took the top. We saw the highest amount of non-McDonalds popularity in the middle states of America.

Our next question looked at the locations of fast food restaurants. For this portion of the analysis we sought to discover in what US location (state) is fast food most abundant. While the data was imperfect, we did discover some key trends. California far exceeded all other states in the number of fast food restaurants they have (~1200). Other states with very large numbers of restaurants include Texas, Florida, and Ohio. The discoveries from this portion of the analysis raised concerns that led into the next question: is the number of fast food restaurants related to population? Just based on California, Texas, and Florida alone, without doing any analysis for question three, it seems like yes.

As stated above, question three investigated whether or not the number of fast food restaurants in a particular state is related to the population of that state. The results of question two began indicating that this were true. Based on the state population data and state # of fast food restaurant data from question two, we did find a stong linear relationship between the two variables. Knowing this, we were able to adjust the findings from question two to look at which states have the most fast food restaurants per capita. Frontrunners here included Wyoming, Arizona, and South Dakota. This is a much different list than before controlling for population.

Overall, this analysis did not prove any groundbreaking discoveries, but rather helps us understand a common thing in the world around us. Fast food certainly isn’t going anywhere for now, and as we’ve seen it is incredibly popular around the country. McDonalds takes the lead, but maybe some key competitors will break through in the future.